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MI1LTI PASS SPEECH RECOGNrTION SYSTEM 
ADAPTIVE MULTI PASS »r 

VmMMM^ h re cognition. More particularly *= 

-^^-*» 7 h l" s ich recognition. The system ***** 

— — ku : ,o : ^ • — * desked conf,dence * 

performs single, double or muto-pass sp 

■ * v% C c 



the speech recognition process. 
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^m^^^ t own which permit a user to interface with a 

""^^"^ " Til h Ignition system receives spoten mpu, 
tom pu,er system using spo.cn language^ ^ ^ a ^ ^ „ Mmput er 

f[om .he user, interpret the input, and *-» ^ rf an ^ wave form - 

syst em understands. More pa— ^ ^ ^ recognition system 

iltaUy samp.ed. The d^ ^ — ^ ^ TT 

according to a speech recogniuon algont^ b pKviously oMal „ed 

^ ^ words or usances of * sp*nmpu ^ acoustic mod e of 

a person who is speaKing. 

upon samples of speech. „ word ,evel template matchmg. 

An example of a to own ^ b ^ t „ pre-stored templates 

Duri „g --level template matching, the**- P ^ ^ spoken input ,s 

which represent various words. A recognition ,echn,,ue is acousttc- 

setected as me output. Another example of a to ^ ^ ^ „ 

paretic recognition. According to acoustrc-^- ^ The res * of 

Rented and identic accordmg = - »f P ^ „ ( „ 
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t^Vmiaue is stochastic speech 

« ^ — - ; — ::r;::;: n - - — - ■ 

recognition. According to stochasttc speed, . ^ ^ — , he pie . 

^ of parameter values which are use Gausslan MMure Models 
st0I ed mod* can he Hidden Ma*ov Mod« ^ ^ md HMMs are 

5 (0MM s, .0 mode! short-term aconsttc b— P ^ ^ representing 

obt ai»ed for phonemes by -n g samples of ota ^ ^ ^ 

ta speech as parang va.es which ^ ^ , ^ ^ for th e 

-+> - * - C-,c -sis are the Baum-Weich n— 

spoken input. Known algonthms for pro 

19 . lte nhood algorithm »d the Viterbi ^ systems is contention 

I A typicai characteristic of such Known speech ^ ^ ^ 

5 between processing time and recognition accura^ , ^ 
^ ^tahlp level of accuracy it> 

% i. ^ & ' ^ ^Its to recognize speech, whereas, a speech recognrtton 

S delay or processing power reautrements n ^ ^ 
, j« system which is pre-conf.gured for an acceptabie sp 

unacceptable error levels. on speed and accuracy has 

A contemplated solution to mts contentron g ^ 

^ nW0 ,ass speech recognition. *> ' «— 

in put according ,0 two speech recogn*on a g^m ^ ^ ^ ^ ^ flow 

^ram for a two-pass speech recogmUon sysKm ^ ^ ^ ^ u 

^ in a star, state - I* > — — " ^ 

-ived. During a first pass » « ^ ^ ^ fot paS s pro uces 

speed, but relatively low accuracy, speech jn ^ ^ m , a low 

several alternative matches for the spoKen, - ^ ^ ^ ^ ^ ^ rf 

sp eed, but relatively high accuracy, <** in . ^ 108 md , then, 

te alternatives produced by me u ^ ^ ^ ^ ^ , , he ^ 104 
program flow terminates m a state UU. 
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the second pass was not expected to unduly 

de ,a S or re,uire undue processus po- ' P ^ ^ fey such w „.pass 

practice, however, for a given accuracy, the total p 

systems tends to be longer man des>red ^ ^ ^ i-jB)e , , wo -pas S 

Similarly, U.S. Paten. No. 5 515^7 ^ ^ ^ ^ ^ is 

speech recognition method in whtch ^ J ^ ^ ^ ^ als0 

performed. For a given accuracy, the total pro 

te „ds to be longer tot desired^ speed wUile 

Therefore, what ,s needed ts a techmq 

••„ a hum degree of recognition accuracy m a speech 
maintaining a nign uegi* 

Smnmary^ofJ^^ ♦ ,« for an improved multi-pass speech 

The — , a Lpled ,0 a source of spoken input 

ignition system. T* system ^ t „ the input device performs a « pass 

for receiving the spoken mput. A process. P ^ ^ fira pass 

speech recognition technic on the spoken ^ ^ ^ ^ „ ^ score 
Its can include a number of alternative correctly matches the spoken 

.presence of the cer-y that the -~£Z- — ' «" ~ ' 
inpu t. As an alternate to returntng such a to ^ by ^ such „ 
n „mber of a!— speech express^, and du g^ . ^ ^ ^ ^ 

prob abil«ies or certainties, and/or language mo« ^ for 

compute given the graph and recogntt.on mod^ ^ ^ 

alternative expressions and differences between such scores 

t0 perform another speech ««**^ te spoken input is the word, "Boston", 

As an example using ^ ~ percent (55% , assigned to the 

te results of «. « P- be ^ ^ t0 me alternative 

expression: "Austin"; a certainty of forty percent ( 
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♦ w/\ assigned to some other expression 

— - — ; - a ^irr Z Z ^ ~— 

technique on .he spoken input »«h . seMnd pass 

resute of the first pass - I' embodiment , the second pass speech 

speech recognition - ^ to n on t y m a,C the spoken input ,o only «ose expressions 
cognizing technique ^P- — if one of the 

whi ch were identified during the firs, pass as to y ^ ^ 

■ , . r.A hv the first pass is assigned a certainty tnai 

expressions identified W «- * P ^ d . 

predetermined threshold (e.g., 95 /o), a sec P ^ whch 

Preferably, *e firs, pass is performed by «* P pass 

narrows me Pos-es for ^^t^*- — "* 

is performed only when necessary - a m„« P ^ ^ ^ ^ 

opera.es on oniy the narrowe possih,^ ; „ accordance ^ the 

necessary to achieve a desired accuracy, me speech , ^ 
inV e„.i„n recognizes speech with a faster average speed 

comparison .o prior system, a of the spoke „ 

ln a preferred embodiment, the first pas ^ ^ me 

mp „t. Thereafter, it is determined whether ,0 ~ ^ rf fte f „ s( 

second pass speech region technique is ^ * rf . ^ rf 

- F — CoX: spier is Cing from, Inwhichcase, 
the spoken input or a type of * ^ ^ ^ ^ ^ sp£akers , one 

,he plurality of speech recognition techn q ^ ^ most 

specific to mate speakers and one specific to callers ^ 

— **" ^rJSS^^* "* ^" ^ to d 

5 if the first pass is unsuccessful at xdentrty g can be selected for the second 

, i i;t„ nf sneech recognition techniques wu 
multiple ones of the plurality ot speecn s 

4 • ■ 



pass and the their results combined. 
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The first pass can recognize words and their phonetic alignments and, then, each 

processing capabiiity is selective* aliocated as needed. In comparison ,0 pnor system *. 
Ivlrldls «» average time taten to recede spo.en input while — g a h.gh 



degree of accuracy. 



VKr \ pf TV-rrintimi " f tV>e drawings 



the prior art. 



Z* — s a speech recognition sys.cn in accordance with the present invent 

in coniunction with a source of speech. 

Tig. 3 iUustrates a flow diagram for a muiti-pass speech recognition system m 

accordance with the present invention. 

Fig . 4 Uinstrates a first alternate flow diagram for a mu,t,pass speech recogmfon 

system in accordance with the present invention. 

Fig . 5 iUnstrates a second alternate flow diagram of a multi-pass speech recogmfon 

svstem in accordance with the present invention. 

' H, 6 illustrates a third alternate flow diagram of a multipass speech recognition 
system in accordance with the present invention. 

Di*ai!ed^^ , th the nresent 

— . . , in accordance with the present 

Fig. 2 illustrates a speech recognition system 200 in accoraan 

c i, o<;n TVip sneech recognition system 2UU 
invention in conjunction with a source of speech 250. The speech 
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on9 « wstem memory 204, a mass storage medium 206, 
includes a general purpose processor 202, a system memo y 

, • ona n ii of which are interconnected by a system bus 210. The 

ZZT^ZZZ* — — — — - ta 

Zsl memory 204 an. mass storage median 206 so as «o implement the present 
t I 7 The input/output ^ces 20S can include a display — a keyboard and. 
^coupled o the source of speech 250 for receive spoken input there from. Though 
mterface cop fomented as a general purpose 

:s ::r::;:-r:;eech i,,, _ can „ - . . 

ZI appose colter or — — — rlT^ 

- rf i—;" :::: — , - — - - 

jnt „ an electrical signal which is provided to the speech recognizing system 200 or 

Ignition. Tne source of speech 2 5 0 can he a telephone system. In 

spee h is provided to the speech recognizing system 200 for recognition. The speech 

svstem 200 can operate in accordance with a service application program stored 
recognizing system 200 op 

in m e memory 204 or mass "h re. ^ ^ ^ ^ ^ 

the speech recognizing system 200 operates in conj 

As a! exampie, the service application program can respond to a telephone calie s spec h y 
!I vTd ng —ion regarding flight availability and pricing for a particular airhne and by 
, the caller to purchase tickets utilizing spoken language and without reaumng 

*i: it: i :l — ~ * — — - «*• 

Ipond to the caller's speech hy providing harming or other financial to 

In the preferred emhodiment, the service ^^^^ L 
answering a series of question, Tor example ^ » departure da,er ; 

include: "What city do you wish to depart from? , What is yo 

■What is your desired destination city- "On what date do you wtsh to return, , and Do y 

., «.„„•• Examnles of questions for a banking system can 
prefer a window seat or an isle seat? . Examples or q 



- 6 - 



PATENT 

Atty. Docket No. NT1AN-00700 



include: "What is your account number?"; "Do you wish to obtain your account balance, 
transfer a balance or withdraw funds?-; "What is the amount you wish to transfer?"; and Do 
you wish to perform another transaction?". 1. will be understood, however, that the parncular 
service application program utilized and questions posed are no, material to the present 
invention and that various different service application programs and questions can be utthzed 

in connection with the invention. 

The invention is a method and apparatus for an improved multi-pass speech 
recognition system. Fig. 3 illustrates a flow diagram for a multi-pass speech recognrfon 
system (also referred to as a speech recognizer) in accordance with the present invention. The 
flow diagram of Fig. 3 illustia.es graphically operation of the speech recognizing system 200 
Hlustrated in Fig. 2 in accordance with the present invention. Program flow begins m a star, 
state 300 From me state 300, program flow moves to a state 302. In .he state 302, the 
speech recognizing system 200 receives spoken input from the source of speech 250. Then 

program flow moves to a state 304. 

In the «* 304, a firs, pass is made during which the spoken inpu. is processed by me 
speech recognizing system 200 according to a first speech recognizing technique. In the 
preferred embodiment, the firs, pass is performed by the speech recognizing system 200 whde 
the speech is stil, being received from the source of speech 250. This tends to mininuze 
delay in comparison to performing the firs, pass after me spoken input is received, though ,t 
will be apparent that the first pass can aUernatelv be performed after the spoken mpu. , 
received 

Program flow moves from the state 304 to a state 306. In the state 306, a 
determination is made as to whether a score associated with the results of the firs, pass 
performed in ,he state 304 exceeds a predetermined threshold. For example, assurmng the 
technique utilized during the firs, pass ,s a s.ochastic speech recognizing algorithm, a resul. of 
,his firs, pass can be a number of alternative speech expressions with each alternative 
expression having an assigned score. The assigned score is a probability or is related to the 
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probability that the corresponding expression correctly corresponds to the spoken mput As a 
firs, example, assume the spoken input is the word -Boston", .n which case, the res* of the 
firs, pass could he a certainty of fifty-five percent (53%) assigned to the express: 
■„ a certainty of forty percent (40%) assigned to the alternative express.cn: Boston , 
and a certatnty of five percent (5%) assigned to some other expression or expressions. As a 
second exampie, assume the spoken input is the words "account balance". ,n whrch case, the 
results of the first pass could be a certainty of ninety-six percent (96%) assigned ,0 the 
expression: "account balance"; a certainty of two percent (2%) assigned to the alternative 
expression: "transfer balance"; and a certainty of two percent (2%) assigned ,0 some other 

expression or expressions. 

I, is expected the number of alternative expressions and corresponding scores w,U vary 
widely as they will depend upon a number of varying factors. Examples of such factors can 
include the speaker, pitch, accent, and enunciation, simi,arities and differences between the 
phonetic sounds of various spoken words, upon the quality of a telephone connection between 
me speaker and the voice recognition system 200, and so forth. 

In (he firs, example, the results of the firs, pass are insufficient ,0 correctly .dentify 

identify the spoken inpu, whh a certify of ninety-six percen, (96./,). These sttuations are 
distinguished in the state 306 by comparing the score assigned to the alternative expresses to 
a predetermined threshold. For example, the predetermined threshold can be set a, a certatnty 
of ninety-five percent (95%), though i, will be apparent tira, tire predetermined threshold can 
be se, at another level, .n the firs, example, no alternative expression is assigned a certatnty 

of ninety-fve percen. (95%) or higher. Therefore, ,he determination made ,„ the state 306 

negative. In which case, program flow moves to a slate 308. 

Note that the determination made in the state 306 can be based in another manner 

upon the assigned scores or certain* level, Fo, example, tire de,erm,na,ion can mc.ude 

calculation of a difference between the highest score or certainty assigned to an expresston 
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^; n ur ^ianed to an alternative 

- - s~ r— izl" . — » 

the difference determination is negate. 

r^n- d * wMch " spoken input is ■ • 

In the state 308, f h recogmzmg 

processed oy .He speech recogrnzmg ^ teclmique atK mp,s to 

technique, .n .he preyed en,— * - ^ express io„s which we, identified 
eon-ccdy -eh the s„ken mpn, o o «h» *™ P ^ ^ ^ ^ 
during the first pass as likely cand,da,es. Thus, th ^ 
attempts ,„ determine whether the spoken input was AusUn o ** 
second pass car. star, from scratch by « P- « 

— ; r: — ::r; - - to . r ,» 

which yields more accurate Outnutting the results can include 

- te trie: S-v- - - - — 

312 ' , u «. tViP expression "account balance" was 

Returning to the second example, because the exp » o ^ ^ 

* • civ nercent (96%) as a result of the iirst pass, pc 
Signed a « " ^ ; of „ *rna,e expression which is a 

,cond pass , not hkeiy » » ^ fc ^ of the fi rs, pass 

m „re likely cordate. Therefore, ^ ^ ^ ^ 

withou. performing a second pass. Tins ,s espeaa y by 
g enera.,y he performed after the spoken mput is ^ ^ ■ ^ ^ 

the second pass —es directly to d e,y = ^ ^ ^ rf 

inve nUon performs a second pass oniy wh» ««» ^ ^ ^ ^ 

. firs, pass. 1, is expected that most o fte ^ reMgnition 

sufficiently certain that a second pass ,s unnecessary. 
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• „rinr svstems which always perform a second 

~ « — ; ^^certainties asstgned «o - — e e^ons are 
Acco/dingly, m the state JUb, me Recause the 

spared L predetermined — - ^^Z^>, the 

expres Jaccoum T " - -^H^ ^ fl „w moves ,o a 
deterrmniion made in the state 306 ,s post.™. 1 rf ^ 

314 . le _ 3,4, * ^^r^TJL, Is select, as »e 
first pa£ is outputted. in the example, the expression 

outpj Program flo/then — S ta ' ~ ^ by . simpler — precognition 
ln sum, the first pass is performed m the - * J 30g only 

syst em which narrows the possioilities, while the ^ on 

w „e„ necessary and hy a more complex — ^ ^ ^ te speech 
narrowed poshes. Became — ^ ^ ^ speec h with a 
recognition system 200 in accordance wth the present 

for a given accuracy than prior systems, 
(aster average speed g a ^ ^ speech recogmtl0n 

Rg . 4 illustrates a firs, alternate flo g 4 
syst em in accordance with the pre- — ^ « jn 2 in accordance 

program flow moves to a state 402. Then program flow moves to a state 

receives spo k en input from the fc 'J. input is processed hy the 

404 . In „ st*e ^ : ^ g t eech recogm-g 

speech recogmzmg system ac ordmg ^ ^ ^ 406j a 

Program flow moves from the «a.e 404 ^ ^ rf fc 

, de— » is made as . J. appr opr.ate speech recogmtton 

first pass. This determmatton ,s then rf 
technique (or a most appropriate speech recogmzmg system) 
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recognizing ,echniques (or speech recognizing systems) for performs a second p. on 
spol input For example, after the first pass, an attempt can be made to d.shngmsh 

eakers 1 are female from speakers who are male. Alternately, after the firs, pass an 
C can he made to distinguish the callers cha^e, type. Per example .e ch = 
M he caiiers who are caliing from a hands-free type of teiephone, caUers from han se 
Z of — e or caiiers via some other type of communication device or medta whtc 
Tds to alter the caller's voice ,n a char— manner. Hands-free telephones are a,*, 
refe rred ,„ as speaker phone, Still filter, after, the firs, pass, an attempt car ^ ma^e to 
distinguish the speaker's accent or diaiec,. For exampie, an attempt can he made ,o det nnme 
whether the speaker is speaking Enghsh associated v*h the Umted Kmgdom or the Un^ 

^ He preferred emhodiment, the spoken input is placed into one of two categories: 
(„ originating from a fema,e speaker; and (2, originating from a male speaker. The spoken 
1 however, can be placed into other categories or another number of categones. For 
Iple, as show, in Fig. 4, the spoken input can be categorized as one of u^ 
(,) originating from a female speaker; (2) originating from a maie speaker; and (3) ongtnatmg 
from a hands-free telephone where the speaker is female or male. 

Assuming the determination made in the state 406 is ,ha, the speaker ,s female, then 
program flow moves from the state 406 to a state 408. In the sU,e 408, a secon pass » 
performed on u,e spoken input according ,0 a speech recognition technique wh,ch ts 
pecific* tailored to perform recognition of speech originating from female speakers, or 
Z+. L second pass technic performed ,n me state 408 can be based upon templates of 
words or utterances formed by taking samp.es of speech of women or, m the case 
smchastic technique, upon models formed by taking samples of speech of wome. 
Accordingly, the speech recognition te chn,,ue performed in the s*,e 408 ,s specfica, , 
I ed t cognize ,he speech of female speakers (and, preferably, ,hose who are no, calhng 
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from a hands-free te.ephone). As a resuh, the technique can perform speech recognition « 
,ess time for a given accuracy in comparison .o a technique which is genera, to an possrb.e 
sneakers or may even result in higher accuracy independent of computation tune. 

From the state 408, program flow moves to a state 410 where the results of the second 
pass are outputted. Program flow then terminates in a state 412. 

Assuming the determination made in the state 406 is that the speaker is male, then 
program flow moves from the state 406 to a state 4,4. In the state 414, a second pass ,s 
performed on the spoken input according to a speech recognition technique whtch ,s 
speeiftcaUy tauored to perform recognition of speech originating from male speakers. 
Because the speech recognition technique performed in tire state 4,4 is specially .a, o d to 
ma,e speakers (and, preferably, those who are not calling from a hands-free telephone) the 
technique can perform speech recognition in less time for a given accuracy or wrth htgher 
accuracy in comparison to a technique which is not so tailored. 

From tire state 4,4, program flow moves to a state 4,0 where me results of the second 
pass are outputted. Program flow then terminates in a state 412. 

Assuming the determination made in the state 406 is that the speaker „ a caUer from 
hands-free te.ephone, men program flow moves from the state 406 to a state 4,6. ,n the state 
416 a second pass is performed on me spoken input according to a speech recognmon 
technique which is specifically tailored to perform recognition of speech originating from 
speakers caHing from hands-free te.ephone, For example, tire second pass technique 
performed in the state 4,6 can be based upon templates of words or utterances formed by 
taking samples of speech of persons calling from hands-free teiephones or, in the case of a 
stochastic technique, upon models formed by taking samp.es of speech of persons caUmg from 

hands-free telephones. 

Because the speech recognition technique performed in the state 416 ts specially 
tailored to speakers calling from hands-free telephones, tire technique can perform speech 
cognition in less time for a given accuracy in comparison to a technique which ,s no. so 

- 12 - 
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tailored, in addition, this technique performed in the state 416 can he pre-configured to 
require more processing time or capability than the techniques performed in the second pass 
sfctes 408 and 414 so as achieve a higher degree of accuracy for hands-free telephones than 
would otherwise he the case. However, because this second pass technique is performed only 
when necessary (i.e. when the caller is calling from a hands-free telephone,, the avenge 
recognition speed is greatly enhanced in comparison to prior systems without a stgmfican. 

increase in the rate of recognition errors. 

The first pass can be performed in the state 404 by the speech recogmzmg system 
while the spoken input is still being received from the source 250. In addition, the 
determination made in the state 406 can be performed whi.e me spoken input is s«.l bemg 
received. This allows the second pass performed in the state 408, 4,4 or 4,6 ,0 begrn wht e 
me spoken input is still being received. Accordingly, me firs, pass performed m the state 
can utilize only a portion of a spoken utterance to determine which second pass recognrtion 
technique to use (408, 4,4, or 4,6), while the second pass begins while the same utterance 
continues. For example, the spoken input can be stored in a first-in, frst-ou, buffer as ■« ,s 
being received, beginning a. a starting address. The second pass begins by removmg the 
spoken input from the buffer beginning with the starting address while spoken input ,s still 
being stored in the buffer. This tends to minimize de,ay in comparison to performing the firs, 
pass or both passes only after the spoken input is received. It will be apparent, however, that 
the first pass or bom passes can be performed after the spoken input is recewed. 

According to an alternate aspect of the present invention illustrated in Fig. 4, multiple 
.elected ones of the speech recognition techniques can be performed in the states 408, 414 
and 4,6 for spoken input prior to outputting me results. Thus, retiring to the example above 
where the spoken input is the word "Boston", assume that a result of the second pass 
performed in the state 408 is a certainty of seventy-five percent (75%) assigned to the 
expression- "Boston" and a certatnty of thirty-five percent (25%) assigned to the alternative 
expression- "Austin". In which case, the certainty that the spoken input is correctly 
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recognized is seventy-five percent (75%). Performance of another pass in the state 414 or 
416 would likely improve the certainty that the spoken input is correctly recognized and 
reduce the frequency of errors. Accordingly, when the highest score or certainty obtained m 
one of the states 408, 414 or 416 is below a predetermined threshold (e.g., 8()o/o) another pass 
is made in one or more of the other states 408, 414, 416. 

Thus, returning to the example, because the certainty of seventy-five percent (75%) is 
lower than the predetermined threshold of eighty percent (80%), program flow moves from 
the state 408 to the state 414. It will be understood that program flow can move from any of 
the states 408, 414, or 416 to any other one of the states 408, 414, or 416, as appropriate, 
though corresponding pathways are not shown in Fig. 4 to avoid obscuring the invention. 
Once two or more of the techniques of the states 408, 414 and 416 are performed, the 
certainties are combined and the results outputted based upon the combined certainties. For 
example, a maximization scheme can be utilized to determine which expression is selected as 
the output. According to this maximization technique, the expression with the highest 
assigned score or certainty is selected as the output. In the example, the highest certainty is 
seventy-five percent (75%) assigned to the expression: "Boston". Accordingly, the term 
"Boston" is selected as the output. Note that the expression "Boston" would preferably be 
selected due to its score of seventy-five percent (75%) even if multiple other speech 
recognition passes selected the expression "Austin" with certainties less than seventy-five 

percent (75%). 

It will be apparent, however, that the evaluation of scores or confidences can be 
performed according to other mathematical techniques. For example, the alternative 
expression having the highest average score or certainty can be selected as the output. 
Returning to the example, assume that a result of the pass performed in the state 414, « a 
certainty of fifty-five percent (55%) assigned to the expression: "Boston" and a certainty of 
forty-five percent (45%) assigned to the alternative expression: "Austin". The average 
certainty assigned to the expression "Boston" in the states 408 and 414 is, therefore, sixty-five 
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„ercen, (65%) whereas, «he average certainty assigned to the expression "Austin" in the states 
l I thieve percent (35%). According, *e ,erm . seiected as the 

^ Further, mu.,i pl e ones of the speech recogniuon technic, he seated b,ed upon 

* a A in the state 404 For example, assume that in the state 
th P results of the first pass performed in the state m ™ v 

mi of the firs, pass are equivoca, as to whether or no. the spea^r is ca, n g fro. 
a hiree telephone, .n which ease, a pass can he performed aec„rdin g to the techmque of 
1 and a ^ can he simultaneously performed according to the technic of one or 

Z of - J. ™ ^ « — K - ^ ^ Wh " Ch 
combined score or certainty is then selected as the output. 

Still further, the firs, pass performed in Ore state 404 can output recogurzed words w* 
_ Phonetic .gnments (where they and ^s =; ^ " 
Sr."S:l~. —on of ,e spo„ hased upon 

, 

re " eetld po n ion of the spoten ,pu, can be a selector muhrple 
s ,ec,ed Phonemes, such as corresponding to a particuiar vowei or vow Is. Once . 
appropriate second pass technique is selected, use of the others can he d.scontmued. If none 
rind pass U recognihon revues results in a ^ ?^ 

. ,11 of the second pass recognition techniques (in the states «u», 

11 ou!puf Al— second pass techniques can he selected 
Mixture Models (GMMs) specific to each technique to the input speech^ £ sd*d 

tft the GMM which best matches the input speecn. 
spmnd Dass is the one corresponding to me uivnvi wi 

addihon, the second pass performed in the states 40S, 4,4, or 4.. - — 
scratch h y disregarding the resnhs of the firs, pass. For example, the results of the first pass 
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co u,d be found to be unreliable (e.„ where ,he highes, score or cer^y ***** 
exttemely low) as may occur if .he f,s« pass acoustic model was rmproper (e.g *e call was 
7Z from a Ld-f.ee cellular telephone «, me mode, was based upon land-based, non- 
Z ^samples, A — " can include omining ihe firs, pass a,,oge,her and, 

"*1L me characteristics of me spoken input by a supplementary input by the 

rantp.e, *e caUer can be prompted to press the number T (one) on the k^pad 
^ismer teiephone if caning from a hands-tree telephone. Alternately, the ca let can be 
romped to r spond verbally ,0 an in q uiry re g ardin g his*er telephone channe type. 

Pig 5 ilLates a second alternate flow diagram of a multi-pass speech recognttion 
system in — with the present invention. The flow dia g ram of Pig. S —s 
"aphically operation of me speech recognizing system 200 illustrated m Ftg. 2 m c» dance 
wit ,e present invention. Program flow begms in a star, state 500. Prom the s^ ^ 0, 
program flow moves to a s*te 502. In the state 502, the speech — mg ^ 200 

, » o^n Then nroeram flow moves to a state 5U4. 

receives spoken input from the source 250. 1 hen program 

,„ the J504, a firs, pass is made during which the spoken input is processed by the 
speech recognizing system 200 according to a frrs, speech recognizing technic In the 

eferred embodiment, the firs, pass is performed by the speech reccing system 200 whrle 
le speech is still being received from the source of speech 250. Thrs tends to _ 
delay in comparison to performing the firs, pass after the spoken tnpu, ts rece.ved, » ugh 
The appal, ma. me firs, pass can a.,ernate,y be performed after the spoken tnpu, ts 

received 

Program flow moves from me s,ale 504 ,o a sU«e 506. In me s*te 506, a 

^'^ur nf the results of the first pass 
determination is made as to whether the score or certainty of the 

Zed in me s.,e 504 exceed a prede,ermmed Uphold (e.g., ^ T* dete_ 
made in the s,a<e 506 can me same as is made in me state 306 desenbed above ,n reference 
T 3 Assuming ma, ,he de,ermma,ion made in me 506 is positive, program flow 

m ves ,0 a s«e 50, .n me s*,e 508, me expression having me highes, assigned score or 
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certainty as a result of the firs, pass is outputted. Pro-am flow ,he„ terminates in a state 

510 ' Assuming, however, tha, fte determination made in the state 506 is negative, then 

* * Tn the state 512 a determination is made as to 

program flow moves to a stote 5.2. In the state 5,2 det e rmtaa ,i„„ 

characteristics of fte spoken input based upon fte resuits of the firs, pass. 
m ade in the state 512 can be fte same as fte determination made m fte state 406 described 
Ite in reference ,0 Pi, 4. This determination ts then utilized to „ec, a most approve 
speech recognition technique (or a most appropriate speech recogmzing system) fr a 

lity of speech recognizing techniques (or speech recognizhig systems) for performing a 
s!l pass on fte spoken input. Por example, fte firs, pass can inciude an attempt o 
Z u sh spears who are female from speakers who are male, caiiers who are eaUmg from 
! hands-free type of teiephone from a handset type of teiephone and/or the partrcuiar accent 

01 —en,, fte spoken tnput .s piaced m,o one of ,wo categories: 

(,) originating from a female speaker; and (2) originating from a male speaker. The spoken 

J owev r, can be placed into other categones or another number of categories. Por 
mput, howe , as ^ of ^ ^ oms: 

" r>e <* :,-g - . male speak, and ,3) onginatmg 

from a hands-free telephone where the speaker is female or male. 

iromananusi f , h » « ale 512 is that the speaker is female, then 

Assuming the determination made in the stole 512 is mat in v 
program flow moves from the state 5,2 to a state 514. In fte state 514, a second pass ,s 
performed on fte spoken input according to a speech recognition technique which is 
C : y tai,ored to perform recognition of speech originate from femaie speakers. Prom 
r;!: 5,4, program flow moves to fte state 50, where fte results of fte second pass are 
outnutted and then terminates in the state 510. 

outpuneo, cuiu s 1 2 is that the speaker is male, 

Similarly, assuming the determination made in the state 512 

bimuary, * In the state 516, a second pass is 

then program flow moves from the state 512 to a state 51b. 
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Wormed on .he spoken input according to a speech recognition technique which is 

Tl taUored to perform recognition of speech originating from male speakers. From 
specfkally tadored to perto results ot ,h e sec0 „d pass are 

the state 516, program flow moves to the state 508, where the res 

outputted, and then .emanates in the state 510^ ^ ^ 

Assuming the determination made m the state 512 rs ma. .he spea 
Hands free .eiephone, men program flow moves from me s*.e 512 .0 a state 518. In 
Ta second pass is formed on me spoken input according to a speech recogmtton 
^J-I * — — d to perform recogmtton of speech 
handle telephone cailers. Prom me state 5,8, program flow moves to a s..e 508, 
the results of the second pass are outputted, and men .ermma.es m the state 510. 

In the preferred emhodiment, the firs, pass is performed in .he ,a,e 504 h y the speech 

■ ■ , m 200 while the spoken input is still being received from the source 250. 
recogmzmg system 200 wh,,e the s ^ p ^ 

- —tons made ,n the « - — tons made in 

the states an Qnes of the 

tn another aspect of the present invention illustrated m ng. , v 
pUg iUon .echoes can be performed in the states 514, 516 and 5,8 for spoken 

tliques can he performed based upon dre resuits of ^ P^— ™ * - - 

1 v^„ tViP result of the decision made m me staie d \± ^ h 
S04 For example, when the resuix 01 mc 

Iher or no. le speaker is calling from a hands-free telephone, a pass can be n^e 
according to the technique of state 5 1 8 and a pass can a,s„ be made accordmg to ur 

t h ,h of the states 514 and 516. The expression which recerves the 

technique of one or both ot the states Ji" 

highest maximum score or certainty is .hen selected as the output. 

Alternately when the highest score or certainty obtamed m one of the states 14, 
and 5,8 " predetermined threshoid (e.g., 80%) another pass is made in one or more 
: l 1TL L, 516 and 518. As an example, program flow can move from the state 
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,, „f the state 514 are below the predetermined threshold. 
5,4 to the state 516 when the results of the state 5, ^ ^ 5i| ^ 

It wiU he understood that program flow can nrove frc^y . 
my ote one of the s»tes 5,4, 5.6 and 51S as — — 

m e not shown in Fi g . 5 to avoid obscuring the — • 0"« ^ ^ 

.echniaues of the sra.es 5,4, 5,6 and 518 are performed, the X ™ 

mi the re* outputted hased upon *e combi— — ^ ^ ^ ^ 

M .ustrated in K , 5 a - pa, : ^ rlw ^ by 
pass identifies the spoken input with a ^determined threshold), 

Lpartn, a certain, asstgned to ahentative ^ ^ not iteltify te 

fcen no additional passes need he performed. How ver ^ 
spoken input w,th the desired de g ree of ^» ~ ^ rf J ormation „. first 
se.ectively performed where the selecuon — _ 

- Prides regard the -"^^Jl^. a male speaker or a caller 

inc ,ude whether the spo en tnp. ™^y ^ „ „ 

from a hands-free telephone. As a result, P ^ ^ 

speech recognition system 200 is selectively allocate* - ^ ^ ^ 

performed while the spoken input is bemg J ^ Um e taken to 

•on-the-fiy). m comparison to pnor systems, the nvvenho ^ ^ 

^ spoken input whi,e —rung a high degree o J ^ _ 

.hresholds u^d for selecting J. — ~ Id accuracy, 
be precisely tailored to provide the desired tra 

dtprnate flow diagram of a mum pabt> :>p 
Fig . 6 illustrates a thtrd alternate g fi 

system in accordance with the present mventton. The flo g 

.■ „ „f the soeech recognizing system 200 tllustratea in s 
gra phically operate of the pe. g ^ ^ ^ ^ ^ 

with the present invention, of S ^ ^ ^ ^ 

correspondence with those of Fig. 5 are given 



discussed further. 
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<u S16 or 518 program flow moves to a state 600 
R eferring to P,, 6, *- ., -*0«£ 'J^ ^ ^ in rhese 

whe re a deter— <• ™* " - ^ 600 prforms the function as the state 
states „ subtly defnU, ^ * ^ J ute ^ ^ m adiWonal pass 
306 (Fig. 3) and 506 (Frgs. 5-6). Assum g ^ ^ ^ where the 

formed. Accordingly, program *- ^ ^ 

results are outputted and, then, prog- ^ pass can be 

„ „» results of the second pass are not rfcten y ^ ^ 

perfo rmed. ,n the preferred ^^^^ unlized during the 
— — Z " be P— according to the accent or 

cr: ::::::: u — - - — - - * — 

according to other characteristics. ^ ^ ^ 6Q2> the 

Thus, fro. f - 600, Prog- ^ speech ^itron 

.* of one o, mire of P«or P-- - rf ^ ^ cm be use d to 

— I r Tls with English spo.en „ the United K,ngd„m or 

determine whether the speech ,s «,r ^ ^ ^ 

in ^ united Sti.es. Assummg tha .t — 

English , then f gram flow moves to a s*te 604 ^ ^ 

specify fred to United Kingdom ^h 'S uU ^ ^ ^ ^ 

„ th e ^ 1 where ' ^J^ ~ ^grarn flow moves , ,e s*,e 
Stat es Engl sh is utilized. From «*« * * flow Knninates ta the state 510. 

508 are outputte, TtoJ* ^ ^ « 

Therefore, as illustrated m F.g. «, one, two 
lively performed according to resu.ts <***"~ ^ cnopping ca n be 

As . mogadon to - — — ■ ^ ^ based 

„,„ lz ed to reduce the amount of urput to * « P ^ ^ ^ 

upon the Phonetic alignments determined by me 
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margin can be employed before and after each utter ^ ^ 

™n amount of silence before and after each utterance. Silence chopping P 
small amount M snen subsequent passes. 

t0 reduce the processing and, thus, the Ume, required by the second , 

As another modification to the invention il.ustrated in Figs. 2 and 4-6, fte ft* pa 

u .H ™ the results of a first or later pass performed on a 
ran be omitted for a sentence based upon the results ui 

7 For example if a prior utterance during a call is from a male speaker, then 
prior sentence. For example, Aitprnatelv the model 

1 next utterance would also be expected to be from a male speaker. Alternat ely, the 
the next utteranc rf spe£ch 

utilized for the first pass can be varied (e.g. selected iro p 
departing from the spirit and scope of the invention. 
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